Skip to content

fix(auth): replace raw Cookie header with httpx cookie jar to fix #273#276

Open
AmanSuryavanshi-1 wants to merge 5 commits intoteng-lin:mainfrom
AmanSuryavanshi-1:fix/issue-273-cross-domain-cookies
Open

fix(auth): replace raw Cookie header with httpx cookie jar to fix #273#276
AmanSuryavanshi-1 wants to merge 5 commits intoteng-lin:mainfrom
AmanSuryavanshi-1:fix/issue-273-cross-domain-cookies

Conversation

@AmanSuryavanshi-1
Copy link
Copy Markdown

@AmanSuryavanshi-1 AmanSuryavanshi-1 commented Apr 13, 2026

Closes #273

Fixes Issue #273 where short-lived 10 minute Google cookies triggering a 302 redirect to accounts.google.com would drop the raw Cookie headers and kill the session. Migrated the auth fetchers and the RPC ClientCore to use a persistent httpx.Cookies jar.

Summary

Problem: Auth fails after ~10 minutes due to raw Cookie header being dropped on cross-domain redirects (Google's cookies expire every 10-15 minutes and trigger a 302 to accounts.google.com to refresh them).

Solution: Replace raw Cookie: header with httpx.Cookies() jar which properly handles cross-domain redirects.

Changes Made

  1. src/notebooklm/auth.py - fetch_tokens(): Uses httpx.Cookies() jar with domain=".google.com"
  2. src/notebooklm/_core.py - ClientCore.open(): Uses cookies jar for HTTP client
  3. src/notebooklm/_core.py - update_auth_headers(): Replaces cookies jar on token refresh
  4. Added _build_cookies_jar() helper method

Test Plan

  • All unit tests pass (179 tests)
  • All integration tests pass
  • Pre-commit checks pass (ruff, mypy)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Improved handling of authentication cookies across domains to avoid session loss and preserve existing cookies during updates
    • Broadened Google auth domain support to reduce redirect/login failures
    • Persist refreshed cookies back to storage so auth updates survive restarts
  • Tests

    • Added integration tests verifying cookies are preserved and merged (not replaced) during auth updates

…g-lin#273

Cross-domain redirects (e.g., to accounts.google.com for token refresh)
now properly retain cookies by using httpx.Cookies() jar instead of
raw Cookie header. This fixes auth failures after ~10 minutes when
Google's short-lived cookies expire and trigger a 302 redirect.

Changes:
- auth.py: fetch_tokens() now uses httpx.Cookies() jar
- _core.py: ClientCore.open() uses cookies jar for HTTP client
- _core.py: update_auth_headers() replaces cookies jar on token refresh
- _core.py: add _build_cookies_jar() helper for cookie jar creation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 13, 2026 08:01
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replaces manual "Cookie" request-header usage with httpx.Cookies jars across client and auth flows; ClientCore builds/uses a cookie jar and merges cookies into the client's jar. Auth helpers now extract/build cookies with original domains (including accounts.google.com) and token fetches use cookie jars. Tests added for cookie-preservation.

Changes

Cohort / File(s) Summary
HTTP Client Core
src/notebooklm/_core.py
Create httpx.AsyncClient with an httpx.Cookies jar via new ClientCore._build_cookies_jar(); update_auth_headers() now merges cookies into self._http_client.cookies instead of setting a raw Cookie header; persisted cookies saved on close.
Authentication & Cookie helpers
src/notebooklm/auth.py
Add extract_cookies_with_domains(), build_httpx_cookies_from_storage(), save_cookies_to_storage(), and fetch_tokens_with_domains(); include accounts.google.com in allowlist; fetch_tokens() now uses an httpx.Cookies jar rather than constructing a raw Cookie header; AuthTokens records storage_path.
CLI helpers
src/notebooklm/cli/helpers.py, src/notebooklm/cli/session.py
get_auth_tokens() supplies storage_path into AuthTokens; minor reformat of OSError console.print message when saving storage_state.json (no behavior change).
Integration tests
tests/integration/test_core.py
Add TestCrossDomainCookiePreservation with tests ensuring update/merge behavior preserves pre-existing cross-domain cookies and does not replace them during update_auth_headers().

Sequence Diagram(s)

sequenceDiagram
  participant ClientCore as ClientCore
  participant Auth as Auth Helpers
  participant HTTPX as httpx.AsyncClient
  participant Google as Google (notebooklm / accounts)

  ClientCore->>Auth: build_httpx_cookies_from_storage(path)
  Auth-->>ClientCore: httpx.Cookies jar (domains preserved)
  ClientCore->>HTTPX: AsyncClient(cookies=jar)
  ClientCore->>HTTPX: GET https://notebooklm.google.com/ (follow_redirects)
  HTTPX->>Google: GET notebooklm.google.com (sends domain cookies)
  Google-->>HTTPX: 302 -> accounts.google.com
  HTTPX->>Google: GET accounts.google.com (sends appropriate jar cookies)
  Google-->>HTTPX: final HTML (200)
  HTTPX-->>ClientCore: response HTML
  ClientCore->>ClientCore: extract CSRF/session tokens
  ClientCore->>HTTPX: update_auth_headers() -> merges/sets cookies on client jar
  ClientCore->>Auth: save_cookies_to_storage(http_client.cookies, storage_path)
  Auth-->>ClientCore: persisted storage_state.json (if applicable)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I built a jar of crumbs so sly,
That hop through redirects, never shy.
From notebook to accounts they glide,
Safe through domains, side by side.
Cookies kept—now let the session fly! 🍪✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: replacing raw Cookie headers with httpx cookie jar to fix issue #273.
Linked Issues check ✅ Passed The PR fully addresses issue #273 by replacing raw Cookie headers with httpx.Cookies jars in auth.py and _core.py, including accounts.google.com in allowed domains, and persisting refreshed cookies to survive cross-domain redirects.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing issue #273: auth cookie handling, cookie jar construction, persistence, and supporting test/CLI updates. No unrelated changes detected.
Docstring Coverage ✅ Passed Docstring coverage is 92.31% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request transitions the HTTP client from using raw 'Cookie' headers to 'httpx.Cookies' jars to support cross-domain redirects during authentication. Feedback indicates that forcing the '.google.com' domain on all cookies may break functionality for other domains like '.googleusercontent.com' and poses a security risk regarding credential leakage. Additionally, the current implementation of updating auth headers may inadvertently overwrite session state provided by the server, and the logic for building the cookie jar is duplicated and should be centralized.

Comment thread src/notebooklm/_core.py
Comment on lines +193 to +196
for name, value in self.auth.cookies.items():
# Use .google.com domain to cover all subdomains including
# accounts.google.com (used for token refresh redirects)
cookies.set(name, value, domain=".google.com")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

Forcing the ".google.com" domain on all cookies is a problematic side effect of using a flat dictionary (self.auth.cookies) to populate the cookie jar. While this ensures cookies are sent to accounts.google.com during redirects, it also means that cookies originally intended for other domains—such as .googleusercontent.com (which is used for media downloads and is explicitly allowed in auth.py)—will now be sent to .google.com and not to their intended destination. This will likely break authenticated downloads from Google's content servers. Additionally, ensure that when making requests to external URLs using these cookies, the domain is validated against an allowlist of trusted domains to prevent credential leakage.

References
  1. When making requests to external URLs using authentication cookies, always validate that the URL's domain is on an allowlist of trusted domains to prevent credential leakage.

Comment thread src/notebooklm/_core.py Outdated
raise RuntimeError("Client not initialized. Use 'async with' context.")
self._http_client.headers["Cookie"] = self.auth.cookie_header
# Replace cookies jar with updated auth tokens
self._http_client.cookies = self._build_cookies_jar()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Re-assigning the cookie jar here will overwrite any cookies that the server might have updated or set during the session (e.g., via Set-Cookie headers in previous requests). Since self.auth.cookies is a static dictionary initialized at startup and never updated, this call effectively reverts the client's session state to its initial values. Given that csrf_token and session_id are handled separately and are not stored in this cookie jar, this reset may be unnecessary and potentially harmful to session persistence.

Comment thread src/notebooklm/auth.py
Comment on lines +666 to +668
cookie_jar = httpx.Cookies()
for name, value in cookies.items():
cookie_jar.set(name, value, domain=".google.com")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This logic for building a cookie jar is duplicated in ClientCore._build_cookies_jar. It should be centralized (e.g., as a method on AuthTokens) to ensure consistent behavior. Additionally, as noted in the core client review, forcing the .google.com domain for all cookies may break functionality for other domains like googleusercontent.com which are used for media downloads.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates NotebookLM authentication HTTP handling to use httpx.Cookies instead of a raw Cookie header, aiming to preserve session cookies across Google redirect flows (notebooklm → accounts → notebooklm) described in Issue #273.

Changes:

  • Switch auth.fetch_tokens() to build and pass an httpx.Cookies jar (vs. a raw Cookie: header).
  • Switch ClientCore.open() to configure the shared httpx.AsyncClient with a cookies jar (vs. headers["Cookie"]).
  • Update ClientCore.update_auth_headers() to refresh the client’s cookies via a new _build_cookies_jar() helper.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/notebooklm/auth.py Builds an httpx.Cookies jar for token-fetching requests that may redirect across Google hosts.
src/notebooklm/_core.py Configures the core httpx.AsyncClient to use a cookies jar and adds a helper to construct it from AuthTokens.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/notebooklm/auth.py
Comment on lines +664 to +668
# Build httpx.Cookies jar instead of raw header to properly handle
# cross-domain redirects (e.g., to accounts.google.com for refresh)
cookie_jar = httpx.Cookies()
for name, value in cookies.items():
cookie_jar.set(name, value, domain=".google.com")
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch_tokens() builds a cookie jar from the filtered cookies dict, but extract_cookies_from_storage() currently excludes cookies whose domain is accounts.google.com (because _is_allowed_auth_domain() only allows ALLOWED_COOKIE_DOMAINS and regional .google.*). If the cross-host refresh flow depends on accounts-scoped cookies (as described in #273), they will never make it into this jar, so redirects to accounts.google.com can still fail. Consider expanding the auth-domain allowlist to include accounts.google.com (and possibly .accounts.google.com) or switching token fetching to use a domain-preserving httpx.Cookies source (similar to load_httpx_cookies()).

Suggested change
# Build httpx.Cookies jar instead of raw header to properly handle
# cross-domain redirects (e.g., to accounts.google.com for refresh)
cookie_jar = httpx.Cookies()
for name, value in cookies.items():
cookie_jar.set(name, value, domain=".google.com")
# Build an httpx.Cookies jar instead of a raw Cookie header so cookies
# participate in redirect handling. The input is a flattened name/value
# mapping, so domain specificity has already been lost by this point.
# To preserve current behavior while allowing redirects through the Google
# Accounts host, register each cookie for both the general Google domain
# and the concrete accounts.google.com host.
cookie_jar = httpx.Cookies()
for name, value in cookies.items():
cookie_jar.set(name, value, domain=".google.com")
cookie_jar.set(name, value, domain="accounts.google.com")

Copilot uses AI. Check for mistakes.
Comment thread src/notebooklm/_core.py Outdated
Comment on lines +177 to +180
if not self._http_client:
raise RuntimeError("Client not initialized. Use 'async with' context.")
self._http_client.headers["Cookie"] = self.auth.cookie_header
# Replace cookies jar with updated auth tokens
self._http_client.cookies = self._build_cookies_jar()
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update_auth_headers() replaces the entire AsyncClient.cookies jar with a new jar rebuilt from self.auth.cookies. Since self.auth.cookies is not updated by refresh_auth() (it only updates CSRF/session_id), this can discard any cookies that were refreshed via Set-Cookie during redirect flows (e.g., the short-lived __Secure-*PSIDRTS cookies), undermining the goal of using a persistent cookie jar. Prefer keeping the existing jar and only mutating it when the cookie values actually change (or explicitly syncing refreshed cookies back into self.auth.cookies before rebuilding).

Copilot uses AI. Check for mistakes.
Comment thread src/notebooklm/auth.py
Comment on lines +670 to 675
async with httpx.AsyncClient(cookies=cookie_jar) as client:
response = await client.get(
"https://notebooklm.google.com/",
headers={"Cookie": cookie_header},
follow_redirects=True,
timeout=30.0,
)
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new redirect-handling behavior in fetch_tokens() is not directly asserted in tests. There are existing pytest_httpx patterns in this repo to inspect outgoing requests; adding an assertion that the redirected request to accounts.google.com still includes a Cookie header (and/or that the request chain completes when cookies are valid) would prevent regressions of #273.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/notebooklm/_core.py`:
- Around line 179-180: The current code replaces the httpx cookie jar object
(self._http_client.cookies = self._build_cookies_jar()), which drops any cookies
httpx added during requests or redirects; instead, call _build_cookies_jar() to
get the new cookies and merge them into the existing jar
(self._http_client.cookies) — e.g., iterate the cookies from
_build_cookies_jar() and set/update them into self._http_client.cookies
(preserving domain/path/attributes) rather than assigning a new httpx.Cookies
object.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7051df54-2ea6-4006-9bb5-cab3ef3ee544

📥 Commits

Reviewing files that changed from the base of the PR and between a997718 and 2990ba2.

📒 Files selected for processing (2)
  • src/notebooklm/_core.py
  • src/notebooklm/auth.py

Comment thread src/notebooklm/_core.py Outdated
…coping and lifetimes

- Add accounts.google.com to ALLOWED_COOKIE_DOMAINS for token refresh
- Fix update_auth_headers() to merge cookies instead of replacing (preserves
  live cookies received from redirects like accounts.google.com)
- Add extract_cookies_with_domains() to preserve original cookie domains
- Add build_httpx_cookies_from_storage() for proper domain-aware cookie jars
- Add fetch_tokens_with_domains() variant that uses original domains
- Add TestCrossDomainCookiePreservation tests for redirect cookie handling

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/test_core.py`:
- Around line 462-490: The test currently only checks cookie presence; to prove
merge (not replace) change test_update_auth_headers_merges_not_replaces to set a
unique sentinel value (e.g., "redirect-only-sentinel") on http_client.cookies
for "SID" (domain=".google.com") before calling core.update_auth_headers(), then
after calling core.update_auth_headers() assert that
http_client.cookies.get("SID", domain=".google.com") == "redirect-only-sentinel"
(and optionally assert other expected tokens still exist). Locate this logic
around NotebookLMClient usage and
core.update_auth_headers()/http_client.cookies.set/get to implement the stronger
equality assertion.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9e59a70d-9893-4a16-bffd-abefc58686b8

📥 Commits

Reviewing files that changed from the base of the PR and between 2990ba2 and 3d90567.

📒 Files selected for processing (3)
  • src/notebooklm/_core.py
  • src/notebooklm/auth.py
  • tests/integration/test_core.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/notebooklm/_core.py
  • src/notebooklm/auth.py

Comment thread tests/integration/test_core.py Outdated
…tion

This addresses CodeRabbit review feedback on PR teng-lin#276 by asserting exact sentinel values survive update_auth_headers(), rather than just checking for 'not None' which would incorrectly pass if cookies were replaced.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/notebooklm/auth.py (2)

682-688: Misleading comment: prioritization logic is ineffective here.

The key is (name, domain), so ("SID", ".google.com") and ("SID", ".google.de") are distinct keys. The elif domain == ".google.com" branch at line 686 can never override a cookie from a different domain—it only handles duplicate entries with the same (name, domain) tuple, which is unlikely in practice.

If the intent is to preserve all domain-specific cookies (which appears correct for cross-domain redirect handling), the comment at line 683 ("Prefer .google.com over regional domains") is misleading. Consider clarifying or removing the dead prioritization branch.

🔧 Suggested simplification
         key = (name, domain)
-        # Prefer .google.com over regional domains
-        if key not in cookie_map:
-            cookie_map[key] = value
-        elif domain == ".google.com":
-            # .google.com takes precedence
-            cookie_map[key] = value
+        # Store each (name, domain) pair; duplicates within the same domain are rare
+        if key not in cookie_map:
+            cookie_map[key] = value
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/notebooklm/auth.py` around lines 682 - 688, The comment claiming
".google.com" is preferred is misleading because the map key is (name, domain)
so domain-specific cookies are stored separately; update the code by either
removing the dead prioritization branch (the "elif domain == '.google.com':
cookie_map[key] = value" block) and its misleading comment, or change the key
logic if you truly want to prefer .google.com (e.g., use name-only keys and
implement explicit precedence). Refer to the variables key, cookie_map, domain
and name to locate the code to modify and ensure the remaining comment
accurately reflects that domain-specific cookies are preserved.

794-798: Log the exception during temp file cleanup.

The try-except-pass silently swallows errors during temp file cleanup. While this is non-critical, logging helps with debugging unexpected permission or filesystem issues.

🔧 Proposed fix
-                try:
-                    temp_path.unlink(missing_ok=True)
-                except Exception:
-                    pass
+                try:
+                    temp_path.unlink(missing_ok=True)
+                except Exception as cleanup_err:
+                    logger.debug("Failed to clean up temp file %s: %s", temp_path, cleanup_err)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/notebooklm/auth.py` around lines 794 - 798, Replace the silent swallow in
the temp file cleanup so exceptions are logged: where
temp_path.unlink(missing_ok=True) is wrapped in try/except, change the except to
capture the exception (e) and log it instead of pass — e.g., call an appropriate
logger (module logger or logging.exception) to record the error and context
including temp_path; target the block around temp_path.unlink to implement this
change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/notebooklm/auth.py`:
- Around line 874-896: The code reads and uses the httpx Response (final_url,
response.text) after closing the AsyncClient; move all uses of response —
computing final_url, calling is_google_auth_redirect(final_url), and calling
extract_csrf_from_html(response.text, final_url) and
extract_session_id_from_html(response.text, final_url) — inside the async with
httpx.AsyncClient(...) block so the response body is still valid; ensure
response.raise_for_status() remains, perform the redirect check and token
extractions before exiting the context, then log and return csrf, session_id.
- Around line 851-896: The function fetch_tokens_with_domains is unused; either
delete it or wire it into the auth flow—specifically replace or augment the
current token-loading path (e.g., AuthTokens.from_storage or the code in
download.py that loads tokens) to call fetch_tokens_with_domains(path) when you
need cookies preserved with original domains, or if you plan to defer, remove
the function and add a brief TODO in the PR description; ensure any callers
expect the returned tuple (csrf, session_id) and handle its exceptions
(FileNotFoundError, httpx.HTTPError, ValueError).

---

Nitpick comments:
In `@src/notebooklm/auth.py`:
- Around line 682-688: The comment claiming ".google.com" is preferred is
misleading because the map key is (name, domain) so domain-specific cookies are
stored separately; update the code by either removing the dead prioritization
branch (the "elif domain == '.google.com': cookie_map[key] = value" block) and
its misleading comment, or change the key logic if you truly want to prefer
.google.com (e.g., use name-only keys and implement explicit precedence). Refer
to the variables key, cookie_map, domain and name to locate the code to modify
and ensure the remaining comment accurately reflects that domain-specific
cookies are preserved.
- Around line 794-798: Replace the silent swallow in the temp file cleanup so
exceptions are logged: where temp_path.unlink(missing_ok=True) is wrapped in
try/except, change the except to capture the exception (e) and log it instead of
pass — e.g., call an appropriate logger (module logger or logging.exception) to
record the error and context including temp_path; target the block around
temp_path.unlink to implement this change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 38899296-865f-4292-85d0-62ea9b97c896

📥 Commits

Reviewing files that changed from the base of the PR and between 8ecb9c9 and bbc175e.

📒 Files selected for processing (3)
  • src/notebooklm/_core.py
  • src/notebooklm/auth.py
  • src/notebooklm/cli/helpers.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/notebooklm/_core.py

Comment thread src/notebooklm/auth.py
Comment on lines +851 to +896
async def fetch_tokens_with_domains(path: Path | None = None) -> tuple[str, str]:
"""Fetch CSRF token and session ID using storage with original domains.

Loads cookies from storage preserving their original domains and makes
an authenticated request. This version is preferred when you need proper
cross-domain redirect handling with the original cookie domains intact.

Args:
path: Path to storage_state.json. If provided, takes precedence over env vars.

Returns:
Tuple of (csrf_token, session_id)

Raises:
FileNotFoundError: If storage file doesn't exist.
httpx.HTTPError: If request fails.
ValueError: If tokens cannot be extracted from response.
"""
logger.debug("Fetching CSRF and session tokens with original cookie domains")

# Load cookies with original domains preserved
cookie_jar = build_httpx_cookies_from_storage(path)

async with httpx.AsyncClient(cookies=cookie_jar) as client:
response = await client.get(
"https://notebooklm.google.com/",
follow_redirects=True,
timeout=30.0,
)
response.raise_for_status()

final_url = str(response.url)

# Check if we were redirected to login
if is_google_auth_redirect(final_url):
raise ValueError(
"Authentication expired or invalid. "
"Redirected to: " + final_url + "\n"
"Run 'notebooklm login' to re-authenticate."
)

csrf = extract_csrf_from_html(response.text, final_url)
session_id = extract_session_id_from_html(response.text, final_url)

logger.debug("Authentication tokens obtained successfully")
return csrf, session_id
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for usages of fetch_tokens_with_domains
rg -n 'fetch_tokens_with_domains' --type=py

Repository: teng-lin/notebooklm-py

Length of output: 174


Remove or integrate fetch_tokens_with_domains() — the function is unused.

The new function defined at lines 851–896 is not called anywhere in the codebase. Either remove it or integrate it into the actual authentication flow (e.g., in download.py or AuthTokens.from_storage()). If it's planned for a follow-up, document that intention or defer the implementation to a separate PR.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/notebooklm/auth.py` around lines 851 - 896, The function
fetch_tokens_with_domains is unused; either delete it or wire it into the auth
flow—specifically replace or augment the current token-loading path (e.g.,
AuthTokens.from_storage or the code in download.py that loads tokens) to call
fetch_tokens_with_domains(path) when you need cookies preserved with original
domains, or if you plan to defer, remove the function and add a brief TODO in
the PR description; ensure any callers expect the returned tuple (csrf,
session_id) and handle its exceptions (FileNotFoundError, httpx.HTTPError,
ValueError).

Comment thread src/notebooklm/auth.py
Comment on lines +874 to +896
async with httpx.AsyncClient(cookies=cookie_jar) as client:
response = await client.get(
"https://notebooklm.google.com/",
follow_redirects=True,
timeout=30.0,
)
response.raise_for_status()

final_url = str(response.url)

# Check if we were redirected to login
if is_google_auth_redirect(final_url):
raise ValueError(
"Authentication expired or invalid. "
"Redirected to: " + final_url + "\n"
"Run 'notebooklm login' to re-authenticate."
)

csrf = extract_csrf_from_html(response.text, final_url)
session_id = extract_session_id_from_html(response.text, final_url)

logger.debug("Authentication tokens obtained successfully")
return csrf, session_id
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Response accessed after client context exit.

The response object is used (lines 882–893) after exiting the async with httpx.AsyncClient(...) block at line 880. While response.text is typically available because the body was fully read during response.raise_for_status(), this pattern is fragile and inconsistent with fetch_tokens() which extracts tokens inside the context.

🔧 Proposed fix: move extraction inside the context
     async with httpx.AsyncClient(cookies=cookie_jar) as client:
         response = await client.get(
             "https://notebooklm.google.com/",
             follow_redirects=True,
             timeout=30.0,
         )
         response.raise_for_status()

-    final_url = str(response.url)
+        final_url = str(response.url)

-    # Check if we were redirected to login
-    if is_google_auth_redirect(final_url):
-        raise ValueError(
-            "Authentication expired or invalid. "
-            "Redirected to: " + final_url + "\n"
-            "Run 'notebooklm login' to re-authenticate."
-        )
+        # Check if we were redirected to login
+        if is_google_auth_redirect(final_url):
+            raise ValueError(
+                "Authentication expired or invalid. "
+                "Redirected to: " + final_url + "\n"
+                "Run 'notebooklm login' to re-authenticate."
+            )

-    csrf = extract_csrf_from_html(response.text, final_url)
-    session_id = extract_session_id_from_html(response.text, final_url)
+        csrf = extract_csrf_from_html(response.text, final_url)
+        session_id = extract_session_id_from_html(response.text, final_url)

-    logger.debug("Authentication tokens obtained successfully")
-    return csrf, session_id
+        logger.debug("Authentication tokens obtained successfully")
+        return csrf, session_id
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/notebooklm/auth.py` around lines 874 - 896, The code reads and uses the
httpx Response (final_url, response.text) after closing the AsyncClient; move
all uses of response — computing final_url, calling
is_google_auth_redirect(final_url), and calling
extract_csrf_from_html(response.text, final_url) and
extract_session_id_from_html(response.text, final_url) — inside the async with
httpx.AsyncClient(...) block so the response body is still valid; ensure
response.raise_for_status() remains, perform the redirect check and token
extractions before exiting the context, then log and return csrf, session_id.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auth fails after ~10 min: raw Cookie header dropped on cross-domain redirects

2 participants